How to Generate Population Density Dots

Motivation

Before we start!!!

Let’s skim through this blog post to understand hierarchical structure of Australian Statistical Geography Standard (ASGS) 3th Edition.

SA2-level population data is commonly used in urban planning to determine warehouse locations. However, fulfillment centers operate at the last-mile stage of the logistics chain, delivering directly to households and retail stores. Therefore, using finer-scale population data — at the SA1 or Mesh Block level — is more appropriate to determine their locations.

The yellow population dots in my report were built on SA1 level.

Inspiration

My inspiration comes from the map1 featured in Population and Housing in Regional Victoria 2020 Report. As far as I know, the original was created using ArcGIS - not open-source.

With some familiarity in R, dplyr, and sf packages — you can recreate similar dot-density plots at different statistical geography levels and resolutions, completely free of charge.

Execution

You need R, RStudio, and some packages to get started. If you are completely new, I recommend resources from RLadies Sydney (BasicBasics and CleanItUp units are must-read).

In R, Working with tabular data using dplyr package is straightforward, |>or%>% pipe operators and these functions: mutate, select, filter and group_by along with 4 types of join should get you far.

The most popular package to work with spatial data in R is sf. An sf object extends regular tabular data frames to include a geometry column, which contains the spatial information. Thus, a simple explanation on how to read shapefiles with sf should be enough to get you started. Then, those mentioned dplyr functions will come in handy.

1. Preparing data

This stage involves:

  • Retrieving population data from the Australian Bureau of Statistics (ABS): ABS provides population counts at the smallest level - Mesh Block, along with allocation file for mapping among levels, and shapefiles for each level. By joining these files2, a shapefile with population counts of any level can be created.

  • Choosing a resolution for the population dots (e.g., 1 dot = 100 people)

1SA1_count_shapefile |>
2  dplyr::mutate(n_dots = floor(Person / 100))  |>
3  dplyr::filter(n_dots > 0) -> SA_count_shapefile
1
SA1_count_shapefile includes population counts Person, and polygon Geometry for each SA1 block (see below).
2
n_dots is the number of population dots to generate per SA1 polygon, calculated as floor(Person / 100). This resolution (1 dot = 100 people) is chosen because most SA1s have between 200 and 800 people, resulting in 2 to 8 dots per block. Using floor() ensures we only generate whole dots and prevents overcounting.
3
Optionally, to reduce computation time, filter(n_dots > 0) is to skip blocks with zero population.

Output of this stage should be an sf object with polygons and number of dots for each polygon.

SA1_count_shapefile
Simple feature collection with 14987 features and 3 fields (with 1 geometry empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 140.9619 ymin: -38.90962 xmax: 149.9762 ymax: -33.98064
Geodetic CRS:  GCS_GDA2020
# A tibble: 14,987 × 4
   SA1_COD     Person n_dots                                            geometry
   <chr>        <dbl>  <dbl>                                  <MULTIPOLYGON [°]>
 1 20101100101    437      4 (((143.8009 -37.55382, 143.8004 -37.55376, 143.800…
 2 20101100102    183      1 (((143.7977 -37.55071, 143.7978 -37.55075, 143.798…
 3 20101100105    382      3 (((143.8156 -37.5587, 143.8169 -37.55885, 143.8175…
 4 20101100106    584      5 (((143.801 -37.55383, 143.8024 -37.55399, 143.8032…
 5 20101100107    363      3 (((143.8129 -37.55643, 143.813 -37.5561, 143.8132 …
 6 20101100108    792      7 (((143.8089 -37.55476, 143.809 -37.55478, 143.8102…
 7 20101100109    528      5 (((143.8044 -37.55316, 143.8046 -37.55213, 143.804…
 8 20101100110    512      5 (((143.7886 -37.54654, 143.7882 -37.54637, 143.786…
 9 20101100111    371      3 (((143.7923 -37.54826, 143.7922 -37.54849, 143.792…
10 20101100112    424      4 (((143.7918 -37.5514, 143.7917 -37.55106, 143.7916…
# ℹ 14,977 more rows

2. Generating population dots

Generating population dots is the process to ramdomly sampling points within each polygon with st_sample() from sf package.

1sf::st_sample(SA1_count_shapefile, size = SA1_count_shapefile$n_dots)|>
2  sf::st_sf() -> points_sf
1
st_sample takes in an sf object with multipolygon geometry, size argument specifies the number of points to sample from each polygon.
2
Result is converted from a list of points to an sf object with st_sf().

Result should have geometry column representing point locations.

points_sf
Simple feature collection with 57470 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 141.0321 ymin: -38.87474 xmax: 149.7605 ymax: -34.12203
Geodetic CRS:  GCS_GDA2020
# A tibble: 57,470 × 2
     FID             geometry
   <dbl>          <POINT [°]>
 1     0  (143.802 -37.55391)
 2     1  (143.804 -37.55111)
 3     2 (143.8023 -37.54956)
 4     3 (143.8039 -37.55215)
 5     4 (143.7962 -37.55324)
 6     5 (143.8123 -37.56792)
 7     6 (143.8129 -37.56038)
 8     7 (143.8142 -37.56515)
 9     8 (143.8009 -37.55428)
10     9 (143.7993 -37.55614)
# ℹ 57,460 more rows

3. Recreating the plot

My data processing stage results in SA1_count_shapefile and the sampling density dots process results in points_sf

Below is the code to recreate the plot at the beginning of this post

library(ggplot2)
library(ggtext)  

ggplot() +
  geom_sf(data = SA1_count_shapefile, fill = "grey95", color = "white", size = 0.1) +
  geom_sf(data = points_sf, color = "orange", size = 0.3, alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "Population Density in Victoria",
    subtitle = "Each <span style='color:orange'>&#9679;</span> = 100 people",
    caption = "Source: ABS Census Data 2021"
  ) +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_markdown(size = 12, hjust = 0.5),
    plot.caption = element_text(size = 9, hjust = 1)
  )

Footnotes

  1. Population density, Victoria 2016. Department of Environment, Land, Water and Planning (2020).
    ↩︎
  2. 4 types of JOIN. Trust-me-bro (n.d.).
    ↩︎